8 research outputs found
Improved ESP-index: a practical self-index for highly repetitive texts
While several self-indexes for highly repetitive texts exist, developing a
practical self-index applicable to real world repetitive texts remains a
challenge. ESP-index is a grammar-based self-index on the notion of
edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees
upper bounds of parsing discrepancies between different appearances of the same
subtexts in a text. Although ESP-index performs efficient top-down searches of
query texts, it has a serious issue on binary searches for finding appearances
of variables for a query text, which resulted in slowing down the query
searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea
behind succinct data structures for large alphabets. While ESP-index-I keeps
the same types of efficiencies as ESP-index about the top-down searches, it
avoid the binary searches using fast rank/select operations. We experimentally
test ESP-index-I on the ability to search query texts and extract subtexts from
real world repetitive texts on a large-scale, and we show that ESP-index-I
performs better that other possible approaches.Comment: This is the full version of a proceeding accepted to the 11th
International Symposium on Experimental Algorithms (SEA2014
Rank and Select Operations on Binary Strings
Rank and Select Operations on Binary String
Dynamizing Succinct Tree Representations
Abstract. We consider succinct, or space-efficient, representations of ordinal trees. Representations exist that take 2n + o(n) bits to represent a static n-node ordinal tree – close to the information-theoretic minimum – and support navigational operations in O(1) time on a RAM model; and some implementations have good practical performance. The situation is different for dynamic ordinal trees. Although there is theoretical work on succinct dynamic ordinal trees, there is little work on the practical performance of these data structures. Motivated by applications to representing XML documents, in this paper, we report on a preliminary study on dynamic succinct data structures. Our implementation is based on representing the tree structure as a sequence of balanced parentheses, with navigation done using the min-max tree of Sadakane and Navarro (SODA ’10). Our implementation shows promising performance for update and navigation, and our findings highlight two issues that we believe will be important to future implementations: the difference between the finger model of (say) Farzan and Munro (ICALP ’09) and the parenthesis model of Sadakane and Navarro, and the choice of the balanced tree used to represent the min-max tree.